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Abstract. We describe a general strategy for sampling configurations from a given 
(Gibbs-Boltzmann or other) distribution. It is not based on the Metropolis concept 
of establishing a Markov process whose stationary state is the wanted distribution. 
Instead, it builds weighted instances according to a biased distribution. If the bias is 
optimal, all weights are equal and importance sampling is perfect. If not, "population 
control" is applied by cloning/killing configurations with too high/low weight. It uses 
the fact that nontrivial problems in statistical physics are high dimensional. Therefore, 
instances are built up in many steps, and the final weight can be guessed at an early 
stage. In contrast to evolutionary algorithms, the cloning/killing is done such that 
the wanted distribution is strictly observed without simultaneously keeping a large 
population in computer memory. We apply this method (which is also closely related 
to diffusion type quantum Monte Carlo) to several problems of polymer statistics, 
population dynamics, and percolation. 



1 Introduction 

For many statistical physicists, "Monte Carlo" is synonymous for the Metropolis 
strategy [Q where one sets up an ergodic Markov process which has the desired 
Gibbs-Boltzmann distribution as its unique asymptotic state. There exist nu- 
merous refinements concerned with more efficient transitions in the Markov pro- 
cess (e.g. cluster flipsQ] or pivot moves||), or with distributions biased such that 
false minima are more easily escaped from and that autocorrelations are reduced 
(e.g. multicanonical sampling|Q and simulated tempering^]). But most of these 
schemes remain entirely within the framework of the Metropolis strategy. 

On the other hand, stochastic simulations not based on the Metropolis strat- 
egy have been used from early times on. Well known examples are evolutionary 
(in particular genetic) algorithms [^[0,^1, diffusion type quantum Monte Carlo 
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and several algorithms devised for the simulation of long 
12| , |13| , ^4|jr^ , p^| . But these methods were developed indepen- 
dently in different communities and it was not in generally recognized that they 
are realizations of a common strategy. Maybe the first who pointed this out 
clearly were Aldous and Vazirani fl7|| who also coined the name "go with the 
winners" . For later references who also stressed the wide range of possible ap- 
plications of this strategy see Pqjl9| . Ref. [jl9| points even to applications in 
lattice spin systems and Bayesian inference, fields which will not be treated in 
the present review. 

As we shall see, the main drawbacks of the go- with-the- winners strategy are: 
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• The method yields correlated samples, just as the Metropolis method does. 
This makes a priori error estimates difficult |l7{ ]. A posteriori errors, esti- 
mated from fluctuations of measured observables, are of course always possi- 
ble. But they can be very misleading when sampling is so incomplete that the 
really large fluctuations have not yet been seen. However, there is also a pos- 
itive side: more easily than in the Metropolis case one can estimate whether 
this has happened, and whether, therefore, the method gives reliable results 
or not. 

• Efficiency is not guaranteed. The go-with-the-winners strategy allows a lot 
of freedom with respect to implementation details, and its efficiency depends 
on a good choice of these. Thus, there are cases where it has not yet been 
successful at all, while there are other problems where its efficiency is not 
nearly matched by any other method we aware of. On the other hand, the 
flexibility of the general strategy represents a strong positive point. 

Instead of giving a formal definition of the go-with-the-winners strategy, we 
shall present an example from which the basic concepts will become clear. In 
later sections we shall then see how these concepts are implemented in detail 
and how they are applied to other problems as well. 

2 An Example: A Lamb in front of a Pride of Lions 

The example is a very idealized problem from population dynamics (or chemical 
reactions, if you whish) pp| , pl| : consider a 'lamb', represented by a random 
walker on a 1-dimensional lattice x — ... — 1,0,1... with discrete time and 
hopping rate p per time unit, leading to a diffusion constant -Di am b- It starts at 
time t — at x — 0. Together with it, there start also N 'lions', til of them 
at Xi = —1 [i = 1, .. .til) and tir — N — nj, at Xi = +1. They also perform 
random walks, but with a diffusion constant -Diion which may differ from -Diamb- 
Two lions can jump onto the same site without interacting with each other. But 
if a lion and the lamb meet at the same site, the lamb is eaten immediately, 
and the process is finished. Note that both the lamb and the lions are absolutely 
short-sighted and stupid: there is no evading or chasing. It is for this reason 
that the model can also be interpreted as the caption of a diffusing molecule by 
diffusing adsorbers. 

The survival probability P(t) of the lamb up to time t can be estimated easily 
for a single lion, N = 1. In this case the relative distance makes a random walk 
with diffusion constant Z?i am b + -C>iion which starts at Ax = 1, and P(t) is equal 
to the probability that the walk has not yet hit an absorbing wall at Ax = 0. 
This probability is well known to decrease as t~ x / 2 , thus 

P N =i~t-^. (1) 

The problem is less trivial but still solvable for N — 2, (see ^jj and the 
literature quoted there). One finds again a power law 



Pjv=2 ~ t~ 



(2) 
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but with an exponent which depends on the ratio Aamb /Aion and on whether 
both lions are on the same side or on different sides of the lamb. For the first 
case one gets 



a 2 



n 2 Aamb 

2 arccos 

TT -Diamb + Ai< 



N = n R = 2,n L = 0. 



(3) 



For the case with both lions on opposite sides one obtains a similar expression. 
If Aamb = Aion, Eq. (§) reduces to a 2 = 3/4. 

For any N > 2 one can still prove rigorously that asymptotically holds, for 
large t, 

P N ~t~ a \ (4) 

but this time one cannot give closed expressions for a^- Numerical values for the 
case where all lions are on one side have been obtained for several N by direct 
simulation (a 3 w 0.91, rj 1.03, a\Q « 1.4 but these estimates become 

more and more difficult for increasing N because of the exceedingly small chance 
for the lamb to survive sufficiently long to allow precise measurements. 

Such numerical estimates would be welcome in order to test an asymptotic 
estimate for N — > oo plj. In this limit, the location of the outmost of a group 
of lions moves nearly deterministically: If the lion who made the front at time 
t lags behind, there will be another lion who overtakes him, so that the front 
continues to move on with maximal speed. Assuming that the fluctuations in 
the motion of the front can be neglected, the authors of EE found 
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N = n R > l,n L = 0. 



(5) 



In the same spirit, the optimal strategy of a lamb squeezed between N/2 lions 
to its left and N/2 to its right would be to stand still. Assuming that this single 
path dominates in the limit N — > oo, one finds simply apf ~ N. 

As we said, straightforward simulations to check these predictions are ineffi- 
cient. In order to improve the efficiency, one can think of two tricks: 



Trick 1 : Make occasional "enrichment" steps. 



In particular, this might mean that one starts with M ^> 1 instances. As soon as 
the number of surviving instances has decayed to a number < M/2, one makes 
a clone of each instance (note that lambs can be cloned also in reality, but on 
the computer we clone the entire configuration consisting of lamb and lions!). 
This boosts the number of instances again up to ks M, and one can repeat the 
game. One has just to remember how often the sample had been enriched when 
computing survival probabilities, i.e. each instance generated carries a relative 
statistical weight w — 1/2 C , with c the number of cloning steps. 



Trick 2 : Replace the random walks by biased random walks 



Not only should the lamb preferentially run away from the lions, but also the 
lions should run away from the lamb in order to obtain long-lived samples that 
contribute to Eq. (Q). If just this were done without compensation, this would 
of course give wrong results. But we can correct for this bias by giving weights 
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to each instance. For each step (of either the lamb or the lion) that was made 
according to a biased pair of probabilities {pl,Pr} instead of {p,p}, we should 
multiply the weight by a factor pjpn if the actual step was to the right, and 
by a factor pjph if the step was to the left. In this way the weights compensate 
exactly, on average, the fact that not all walks were sampled with the same 
probability. This trick is indeed very general. In any sampling procedure where 
some random move should be done with probability p > in order to obtain 
an unbiased sample, one can replace p by any other probability p' ^ if we at 
the same time use weighted samples and multiply the current sample weight by 
p/p'. 

Actually, in view of the second trick, the first one is clearly not optimal. 
Instead of cloning irrespective of its weight, one would like to clone preferentially 
those configurations which have high weight. Thus wc replace the first trick by 

Trick 1' : Clone only configurations with high weight. 
Choose a cloning threshold W+ (t) . It can be in principle an arbitrary function of 
t, and it need not be kept fixed during the simulation; thus it can be optimized 
on-line. Good choices will be discussed later. If a configuration at some time t 
has weight w > W+(t), it is cloned and both clones are given weight w/2. 

On the other hand, a too strong bias and too frequent cloning could result in 
configurations which have too small weight. Such configurations are just costly 
in terms of CPU time, without adding much to the precision of the result. But 
we are not allowed to kill ( "prune" ) them straigh away, since they do carry some 
weight nevertheless. Instead, we use 

Trick 3 : Kill probabilistically configurations with low weight. 
Choose a pruning threshold W- (t) . The same remarks apply to it as to W+ (t) . 
If w < W-(t), we call a random number r uniformly in [0,1]. If r < 1/2, we 
prune. Otherwise, if r > 1/2, we keep the configuration and double its weight, 
w — > 2w. Again this does not introduce a bias, as far as averages are concerned. 

In principle, that's all. One can modify the tricks 1', 2 and 3 by making more 
than one clone at each enrichment step, by killing with probability 1/2, or by 
letting W± depend also on other variables. Whether such further improvements 
are helpful will depend on the problem at hand, in the case of lamb & lions it 
seems they were not. Indeed, in this problem also pruning was not needed if the 
bias was not too strong, but this is somewhat special. 

Before going on and describing the detailed implementation, let us just see 
some results. Probabilities i-W(i) for all lions at the same side and the result- 
ing decay exponents are shown in Fig.l, for N up to 50. We see that Eq. ([|) is 
qualitatively correct in predicting a logarithmic increase of ajv, but not quantita- 
tively. Obviously, fluctuations of the front of the pride of lions are not negligible. 
The data on the right panel show a slight downward curvature. This might be 
an indication that Eq. (||) is asymptotically correct, but then asymptotia would 
set in only at extremely large values of N. The same conclusion is reached when 
N/2 lions are on either side, as seen from Fig. 2. In that case the raw data, shown 
in the left panel, clearly demonstrate the power of the algorithm: We are able 
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Fig. 1. Left panel: Survival probabilities for a lamb starting next to N lions, all of 
whom are on the same side. Lamb and lions both make ordinary random walks with 
Du on — -Diamb = 1/2. Right panel: Corresponding decay exponents. The lower dashed 
line represents the prediction from Eq. 



N/2 lions on each s 



N/2 lions on each side 



Fig. 2. Same as Fig.l but for N/2 lions on each side of the lamb. This time neglecting 
fluctuations of the front of the group of lions would give ajv = N. 

to obtain reliable estimates of probabilities as small as 10 -50 , which would have 
been impossible with straightforward simulation. 

3 Other Examples 

3.1 Multiple Spanning Percolation Clusters 

Let us now consider percolation [^2| on a large but finite rectangular lattice in 
any dimension 2 < d < 6. We single out one direction as "spanning direction" . In 
this direction boundary conditions are open (surface sites just have no neighbours 
outside the lattice), while boundary conditions in the other direction(s) might 
be either open or periodic. Up to some six years ago there was a general believe, 
based on a misunderstood theorem, that there is at most one spanning cluster 
in the limit of large lattice size, keeping the shape of the rectangle fixed (Lj = 
XfL, L — > co, i = 1, . . . d). A 'spanning cluster' is a cluster which touches both 
boundaries in the spanning direction. 
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Since there is no spanning cluster for subcritical percolation (with probability 
decreasing exponentially in the lattice size L), and since there is exactly one in 
the supercritical case, the relevant case is only critical percolation. For that case 
it is now known that the probabilities to have exactly fc spanning clusters 
are all non-zero in the limit L — > oo. In two dimensions they were calculated by 
Cardy exactly using conformal invariance p3| , but in dimensions > 3 no exact 
results are known. The only analytical 'result' is a conjecture by Aizenman pifl , 
stating that for a lattice of size L x . . . x L x (rL) (rL is the length in the 
spanning direction) 

Pk ~ e- ar (6) 

with 

a cx fc^- 1 ) for fc > 1. (7) 

Cardy's formula in d = 2 agrees with Eqs. (Q) and (Q) , for periodic transverse 
boundary conditions it is 

a = f( fc2 ~l) fc > 2 > d = 2 - (8) 

It has recently been generalized J25| to the case where the clusters are seperated 
by at least two lattice units (i.e., there are at least two non-intersecting paths 
on the dual lattice between any two clusters). In that case 

2tt /,3L, 1\ 

In order to test Eqs. (0)-@ for a wide range of values of fc and r, one has to 
simulate events with tiny probabilities, lnP^ ~ — 10 2 to — 10 3 . It is thus not 




Fig. 3. Configuration of 5 spanning site percolation clusters on a lattice of size 500x900. 
Any two clusters keep a distance of at least 2 lattice units. Lateral boundary conditions 
are periodic. The probability to find 5 such spanning clusters in a random disorder 
configuration is ~ 10 -92 . 
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surprising that previous numerical studies have verified Eq. (|^) only for small 
values of k, and have been unable to verify or disprove Eq. (R) p?^j27| . 

In order to demonstrate that such rare events can be simulated efficiently 
with the go-with-the-winners strategy, we show in Fig. 3 a rectangular lattice of 
size 500 x 900 with 5 spanning clusters which keep distances > 2. Eq.(||) predicts 
for it Pk = exp(— 3367r/5) w 10 -92 . This configuration was obtained by letting 
5 clusters grow simultaneously, using a standard cluster growth algorithm p8| , 
from the left border. Precautions were taken that they grow with the same speed 
towards the right, i.e. if one of them lagged behind, the growth of the others was 
stopped until the lagging cluster had caught up. If one of them died, or if two 
came closer than two lattice units, the entire configuration was discarded. If not, 
cloning was done as described in trick 1'. Note that here the growth was made 
without bias (it is not obvious what this bias should have been), and therefore 
the weight was just determined by the cloning. Due to that and since there are no 
Boltzmann weights, no configuration could get too high a weight, and therefore 
no pruning was necessary either. 

In this way we could check Eqs. (@),@ with high precision. We do not show 
these data. Essentially they just test the correctness of our algorithm. 

More interesting is the test of Eq. (0) for d — 3. Estimated probabilities for 
up to 16 spanning clusters, on lattices of sizes up to 64 x 64 x 2000, are shown 
on the left panel of Fig. 4. Note that probabilities now are as small as 10" 300 . 
Values of a obtained from these simulations and from similar ones at different 
lattice sizes (in order to eliminate finite-size corrections) are shown in the right 
panel of Fig. 4. The dashed line there is a fit [E9[ 



a = 2.76(fc 2 -0.61 



,3/4 



(10) 



Even if we should not take this fit too serious, we see clearly that a oc k 3 ^ 2 for 
k — > oo, in perfect agreement with Eq. (]?]). 




Fig. 4. Left panel: probabilities to have k spanning clusters on simple cubic lattices of 
size 64 x 64 x L, with k < 16 and L < 2000. Right panel: Decay exponents at versus k 
obtained from the data on the left panel, and from similar data with transverse lattice 
sizes 16 x 16, 32 x 32, and 128 x 128. The dashed line is a fit with a k ~ k 3/2 for large 
k. 
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3.2 Polymers 

Another big class of problems where the go-with-the-winners strategy is nat- 
urally applied are configurational statistics of long polymer chains. It is well 
known that linear polymers in good solvents form random coils which differ 
from random walks by having size 

R~N U (11) 

with v ± 1/2: v = 3/4 in 2 dimensions, and v w 0.588 in d = 3 
The canonical model which gives this anomalous scaling is the self avoiding 
random walk. Anomalous scaling laws in other universality classes are obtained 
by attaching polymers to impenetrable boundaries, to attractive walls, by adding 
monomer-monomer attraction, etc. Simulating long chain molecules was thus a 
vigorous problem since the very early days of electronic computers. 

The most straightforward method to simulate a self avoiding random walk on 
a regular lattice is to start from one end and to make steps in random directions. 
As long as no site is visited twice, every configuration should have the same 
weight. But as soon as a site is visited which has already been visited before, 
the energy becomes infinite because of hard core repulsion, the weight thus 
becomes zero, and the configuration can be discarded. This leads to exponential 
"attrition" - the number of generated configurations of length t decreases as 
C(t) ~ exp(— at) - and to a very inefficient code. 

A first proposal to avoid - or at least reduce - this attrition was made 
by Rosenbluth and Rosenbluth ]l^| . They proposed to bias the sampling by 
replacing steps to previously visited sites by steps to unvisited ones, if possible. 
Take e.g. a simple cubic lattice. Except for the very first step, there will be 
at most 5 free neighbours for the next move. If there is no free neighbour at 
any given moment, the configuration must be discarded. Otherwise, if there are 
m > 1 free neighbours, one selects one of them randomly and moves to it. At 
the same time, in order to compensate for this bias one multiplies the weight 
of the configuration by a factor oc m (the value of the proportionality constant 
is irrelevant for estimates of averages, and affects the partition sum in a trivial 
way only). 

Although this allows much longer chains to be simulated, the Rosenbluth 
method is far from perfect because it leads to very large weight fluctuations Q . 
As an alternative, enrichment was therefore proposed - in the form of trick 1 
of Sec. 2 - in (isj| . But more efficient than either is the full go-with-the-winners 
strategy with all three steps 1', 2, and 3. Population control (pruning/cloning) 
is of course done on the basis of full statistical weights, including both Boltz- 
mann and bias correction factors. This was first used in fl5| and later, with a 
different implementation, in jl(|. In the latter, it was called the 'pruned-enriched 
Rosenbluth method' (PERM). 

PERM is particularly efficient near the so-called 'theta-' or coil-globule tran- 
sition. This transition occurs when we start with a good solvent and make it 
worse, e.g. by lowering the temperature T. The repulsive interaction between 
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monomers and solvent molecules lead to an effective monomer-monomer attrac- 
tion which would like to make the polymer collapse into a dense globule. If T is 
sufficiently high, this is outweighed by the loss of entropy associated to the col- 
lapse. But at T < Tg, the entropy is no longer sufficient to prevent the collapse. 
According to the generally accepted scenario, the theta-point is a tricritical point 
with upper critical dimension d = 3 |50|,[n| . 

At the 3-d theta-point bias correction and Boltzmann factors nearly cancel. 
Therefore, long polymers have essentially random walk configurations with very 
small (logarithmic) corrections. Therefore, an unbiased random walk (with just 
a non-reversal bias: no 180 degree reversals are allowed) is already sufficient to 
give good statistics with very few pruning and enrichment events. In |]l6| chains 
made of up to 1,000,000 steps could be sampled with high statistics within 
modest CPU time. 

Applications of PERM to other polymer problems are treated in ||,||,||,|6H|,||,||j4^,|l|,||,|3lj4|. 

We want to discuss here just two application, namely the 'melting' (denatura- 
tion) of DNA Q and the low-energy ("native") states of heteropolymers|35| . 



DNA Melting As is well known, DNA in physiological conditions forms a 
double helix. Changing the pH value or increasing T can break the hydrogen 
bonds between the A-T and C-G pairs, and a phase transition to an open coil, 
with higher energy but also with higher entropy, occurs. This transition has 
been studied experimentally since about 40 years. It seems to be very sharp, 
experimental data are consistent with a first order transition |Q . While a second 
order transition would be easy to explain |46|^ | , constructing models which give 
first order transitions turned out to be much more difficult j4S| . 

The model studied in jl0| lives on a simple cubic lattice. A double strand of 
DNA with length N is described by a diblock copolymer of length 2N, made of 
N monomers of type A and N monomers of type B. All monomers have excluded 
volume interactions, i.e. two monomers cannot occupy the same lattice site, with 
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Fig. 5. Left panel: Specific heat as a function of e at T = 1, for single strand length 
iV = 500, . . . 3000. Right panel: Histograms of the number of contacts, for the same 
chain lengths, at e = e c . On the horizontal axis is plotted n/N as is appropriate for a 
first order transition. 
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Fig. 6. Average squared end-to-end 
distance -R^nd f° r various values of 
q = e E , plotted against IV on a 
double logarithmic scale. Since all 
curves are based on independent 
runs, their typical fluctuations rela- 
tive to each other indicate the order 
of magnitude of their statistical er- 
rors. 



one exception: The fc-th A-monomer and the fc-th B-monomer, with k — 1 . . . N 
being their index counted from the center where both strands are joint together, 
can occupy the same site. If they do so, then they even gain an energy — e. This 
models the binding of complementary bases. 

The surprising result of simulations of chains with N up to 4000 is that the 
transition is first order, but shows finite scaling behaviour as expected for a 
second order transition with cross-over exponent <p = 1. To demonstrate this, we 
first show in Fig. 5 the specific heat as a function of e at T = 1, for several chain 
lengths. We see a linear increase of the peak height with N which indicates a first 
order transition. In the right hand panel of Fig. 5 are plotted energy histograms 
for the same chain lengths. Energy is measured in terms of number n of contacts, 
divided by chain length TV. One sees two maxima, one at n = and the other at 
n w N/2, whose distance scales proportionally to N. This again points to a first 
order transition. But in contrast to usual first order transitions the minimum 
between these two maxima does not become deeper with increasing N. This is 
due to the absence of any analogon to a surface tension. Finally, in Fig. 6 we 
show average squared end-to-end distances. They obviously diverge for infinite 
iV when the transition point is approached from low temperatures. This is typical 
of a second order transition. A more detailed analysis shows that this divergence 
is -Rend ~ (e — e c )" ', as one would expect for a transition with <f> = 1 |fjb"[ . 

Native Configurations of Toy Proteins Models Predicting the native 
states of proteins is one of the most challenging problems in mathematical biol- 
ogy p9| . It is not only important for basic science, but could also have enormous 
technological applications. At present, such predictions are mostly done by ana- 
log methods, i.e. by comparing with similar amino acid sequences whose native 
states are already known. More direct approaches are hampered by two difficul- 
ties: 

• Molecular force fields are not yet precise enough. Energies between native 
and misfolded states are usually just a few eV, which is about the typical 
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precision of empirical potentials. Quantum mechanical ab initio calculations 
of large biomolecules are impossible today. 
• Even if perfect force fields were available, present day algorithms for finding 
ground states are too slow. One should add that the accepted dogma is that 
native states - at least of not too large proteins - are essentially energetic 
ground states. 

In view of the second problem, there exists a vast literature on finding ground 
states of artificially constructed heteropolymers. Most of these models are for- 
mulated on a (square or simple cubic) lattice and use only few monomer types. 
The best known example is the HP model if K. Dill J5(| which has two types 
of amino acids: hyrophobic (H) and hyrophilic (polar, P) ones. With most algo- 
rithms, one can find ground states typically for random chains of length up to 
- 50. 

In |}5| we have used PERM to study several sequences, of the HP model 
and of similar models, which had been discussed previously by other authors. 
In all cases we found the lowest energy states found also by these authors, but 
in several cases we found new lowest energy states. A particularly impressive 
example is a chain of length 80 with two types of monomers with somewhat 
artificial interactions: Two monomers on neighbouring lattice sites contribute 
an energy —1 if they are of the same type, but do not contribute any energy if 
they are different j5l| . A particular sequence was constructed such that it should 
fold into a bundle of four "helices" with an energy —94 plf . Even with a specially 



designed algorithm, the authors of 1 51 were not able to recover this state. With 
PERM we not only found it easily, we also found several lower states, the lowest 
one having energy —98 and having a completely different structure. Instead of 
being dominanted by a-helices, it has mostly /3-sheets (as far as these structures 




Fig. 7. Left: Putative native state of the "four helix bundle" sequence of |3l| . It has E — 
—94, fits into a rectangular box, and consists of three homogeneous layers. Structurally, 
it can be interpreted as four helix bundles. Right: True ground state with E = —98. 
Its shape is highly symmetric although it does not fit into a rectangular box. It is 
degenerate with other configurations not discussed here. 
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can be identified on a lattice), see Fig. 7. Since PERM gives not only the ground 
state but the full partition sum, we could also follow the transition between 
mostly helical states at finite T and the sheetlike ground state. We found a peak 
in the specific heat associated to this transition which could have been mistaken 
as a sign of a transition between a molten globule and the frozen native state. 

3.3 Lattice Animals (Randomly Branched Polymers) 

Consider the set of all connected clusters C n of n sites on a regular lattice, with 
the origin being one of these sites, and with a weight defined on each cluster. 
The (n-site) lattice animal problem is defined by giving the same weight to each 
cluster. The last requirement distinguishes animal statistics from statistics of 
percolation clusters. Take site percolation for definiteness, with 'wetting' proba- 
bility p. Then a cluster of n sites with b boundary sites carries a weight p n (l —p) b 
in the percolation ensemble, while its weight in the animals ensemble is inde- 
pendent of b. In the limit p — > this difference disappears obviously, and the 
two statistics coincide. Due to universality, we expect indeed that the scaling be- 
havior is the same for any value of p less than the critical percolation threshold 
p c . It is generally believed that lattice animals are a good model for randomly 
branched polymers Js^] . 

While there exists no simple and efficient algorithm for simulating large an- 
imals which also gives estimates for the partition sum, there exist very simple 
and efficient algorithms for percolation clusters. The best known is presumably 
the Leath algorithm psj] which constructs the cluster in a "breadth first" (see 
next section) way. 

Our PERM strategy consists now in starting off to generate subcritical per- 
colation clusters by the Leath method, and in making clones of those growing 
clusters which contribute more than average to the animal ensemble J|3| . Since 
we work at p < p c , each cluster growth would stop sooner or later if there were 
no enrichment. Therefore we do not need explicit pruning. The threshold W+ 
for cloning is chosen such that it depends both on the present animal weight and 
on the anticipated weight at the end of growth. 

Usually, with growth algorithms like the Leath method, cluster statistics 
is updated only after clusters have stopped growing. But, as outlined below, 
one can also include contributions of still growing clusters. For percolation, this 
reduces slightly the statistical fluctuations of the cluster size distributions, but 
the improvement is small. On the other hand, this improved strategy is crucial 
when using PERM to estimate animal statistics. 

Consider a growing cluster during Leath growth. It contains n wetted sites, b 
boundary sites which are already known to be non- wetted, and g boundary sites 
at which the cluster can still grow since their status has not yet been decided 
("growth sites"). This cluster will contribute to the percolation ensemble only 
if growth actually stops at all growth sites, i.e. with weight (1 — p) g . Since 
the relative weights of the percolation and animal ensembles differ by a factor 
(l- p )(b+g) ( S i nc e 

now b + g is the total number of boundary sites), this cluster 
has weight w(C) oc (1 — p)~ b in the animal ensemble. If we would use only this 
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Fig. 8. A typical lattice animal with 
8000 sites on the square lattice. 



weight as a guide for cloning, we would clone if w(C) is larger than some W+ 
which is independent of b and g, and which depends on n in such a way that 
the sample size becomes independent of n. But clusters with many growth sites 
will of course have a bigger chance to keep growing and will contribute more to 
the precious statistics of very large clusters. It is not a priori clear what is the 
optimal choice for W+ in view of this, but numerically we obtained best results 
for W+ oc (1 -p)B. 

In this way we were able to obtain good statistics for animals of several 
thousand sites, independent of the dimension of the lattice. A typical 2-d animal 
with 8000 sites is shown in Fig. 8. We were also able to simulate animal collapse 
(when each nearest neighbor pair contributes — e to the energy), and animals 
near an adsorbing surface. Details will be published in Ref.Q. 

4 Implementation Details 

In this section, the notation will be appropriate for the lamb-and-lions problem 
of Sec. 2, but all statements hold mutatis mutandis also for the other problems. 

4.1 Depth First Versus Breadth First 

As described in trick 1 of Sec. 2, original enrichment was implemented "breadth 
first" . There one keeps many replicas of the process simultaneous in the com- 
puter, and advances them simultaneous. This is also the traditional way of im- 
plementing evolutionary algorithms HQ]- There it is required for two reasons: 
Because of cross-over moves where two configurations ( "replicas" , "instances" , 
"individua" , ...) are combined to give a new configuration, and because of tour- 
nament selection where the less fit of a randomly selected pair is killed and 
replaced by the more fit. In the present case there are no cross-overs, and tour- 
nament selection is replaced by comparing the "fitness" w against thresholds 
W± which will be determined by some average fitness. 
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This allows us to use a "depth first" implementation where only a single 
replica is kept in computer memory at any given time, and only when this is 
pruned or has reached its final time i ma x, a new replica is started. The names 
"breadth first" and "depth first" come from searching rooted trees where the 
first searches the tree in full breadth before increasing the depth, while in the 
second one first follows a branch in full depth and only then considers alternative 
branches |Q^5|. 

The main advantages of depth first algorithms are reduced storage require- 
ments (which can be important even in present days of parallel machines where 
breadth first algorithms can be implemented by putting each configuration on its 
own node) and elegance of programming. While the 'natural' coding paradigmes 
for breadth first algorithms are iterations and first-in/first-out queues, for depth 
first approaches they are recursions and stacks. In order to implement the lamb- 
and-lions problem we need just a recursive function STEP(t,w,x,xi, ... xjv) whose 
arguments have the obvious meaning. When called, it increases t — > t+1, selects 
new positions from the neighbours of the previous ones, updates w accordingly, 
and calls itself either twice (cloning) , once (normal evolution) or not at all (prun- 
ing). A pseudocode for this is given in Jl^] . 

There is a folklore saying that recursions are inefficient in terms of CPU 
time , and large recursion depths should be avoided. It can be avoided since 
each recursion can also be re-coded as an iteration (FORTRAN 77, e.g., has 
no recursion and is yet a universal language). But the speed-up is negligible 
on modern compilers (less than 10%, typically), depths of 10 4 — 10 5 make no 
problems, and readability of the code is much worse if recursion is not used. 
We see only one reason for avoiding recursion, and that is its less efficient use of 
main memory. In very large problems where stack size limitations can be crucial, 
recoding in terms of iterations might be needed. 

4.2 Choosing W± 

In general, thresholds for pruning/cloning should not be too far apart since 
otherwise the weights fluctuate too much and most of the total weight is carried 
by few configurations only. We had best experiences with 3 < W+/W- < 10 in 
most applications, but other authors Q report good results also for W + /W- « 
100. Obviously, the precise value is not very important. 

More important is W+ itself. As a rule of thumb, it should be chosen such 
that the total number of configurations C (t) created at time t is independent of 
t. If C(t) decreases with t, most of the CPU time is spent on small t and the 
statistics at large t depend on only few realizations. Inversely, if C(t) increases 
with t, all configurations at large t are descendants of only few ancestors and 
are thus strongly correlated. 

There is a very simple way to guarantee the approximate (up to a factor ~ 1) 
constancy of C(t) Let us denote by Z(t) the path integral (or partition sum), 
and Z(t) its estimate from the current simulation, 




(12) 
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Here M is the number of starting configurations which have already been treated, 
and Wj(t) is the weight of the j-th configuration at time t. C(t) will be roughly 
independent of t if 

W± = c ± Z{t) (13) 

with c± being constants of order unity (typically, c + « l/c_ « [W-f/Wl] 1 ' 2 ). 
We thus start the simulation with some guess for W± (the precise values are 
largely irrelevant, any large values would also do) and replace them by Eq. ( p"3| ) 
as soon as there was already a configuration at t, i.e. as soon as we have Z(t) > 0. 

More sophisticated ways [ j35|]33]l to choose W± are needed only for very hard 
problems with excessive pruning and cloning. In this case, the above method 
would occasionally give excessively large "tours" (a tour is the set of all configu- 
rations which descend from the same ancestor, i.e. which are obtained by cloning 
from the same starting configuration). To cut them short, one should make W± 
larger than given by Eq. ([l3|) if a tour is already very large. We should however 
warn the reader that in such hard cases the estimates of partition sums are no 
longer reliable, and results should be taken with some suspicion. 

4.3 Choosing the Bias 

As a general rule, the bias should be such that the bias correction factor can- 
cels exactly the Boltzmann weight (if there is one) and minimizes the number 
of pruning/cloning events. A bad choice of the bias is immediately seen in an 
increase of these events, and in a decrease of the number of tours which reach 
large values of t. In t, a simulation corresponds essentially to a random walk 
with reflecting boundary at t = 0. While normal evolution steps correspond to 
forward steps in t, pruning events correspond to backward jumps to the last 
previous cloning time. A proper choice of W± eliminates any drift from this 
random walk, while a good bias maximized the effective diffusion constant. If 
W+ is chosen according to Eq. (f[3"|), the CPU time needed to create an inde- 
pendent configuration at large t increases essentially ~ t 2 , the prefactor can be 
substantially decreased by choosing a good bias. 

Unfortunately, there is no universal recipe for such a good bias. There is 
a general prescription in the case of diffusion quantum Monte Carlo (see next 
section) and for related Markov processes, but even this is usually not easy to 
implement. In other cases such as polymers with self avoidance one has only 
heuristics. Sometimes even the algorithm without bias is already very efficient, 
such as for multiple spanning percolation clusters. In other cases, as in the lamb- 
and-lions problem, the qualitative properties of the bias are obvious, but for its 
quantitative implementation we had used trial and error. 

One possible way to determine a good bias (or "guiding" , as it is sometimes 
called) is to look k steps ahead. For a polymer, e.g., one might try all extensions 
of the chain by k monomers, and decides on the success of these extensions 
which single step to take next. This scanning method j57|] is efficient in guiding 
the growth, but also very time consuming: The effort increases exponentially 
with k. For polymers at low temperatures, where Boltzmann factors can become 
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very large, this may be efficient nevertheless. But for a-thermal SAWs and for 
lattice polymers in the open coil phase, another method is much more efficient. 

Markovian Anticipation fn this alternative strategy |57j to guide polymer 
grow, one essentially does not look forward k steps, but backward. Thus we 
remember during the growth the last k steps. On a lattice with coordination 
number A/", this means we label the present configuration by an integer i = 
1, . . .J\f k . Assume now that the next step is in direction j, j = 1, . . .TV. During 
the initial steps of the simulation (or during an auxiliary run) we build up a 
histogram H(i,j) of size J\[( k+1 > . There we add up the weights with which all 
configurations with history (i,j) between the steps n — k and k contribute to 
the partition sum of chains with length n + m, with m 3> 1 (we typically use 
m « 100- 200). The ratio 

WB -^f) ,14) 

is then an estimate of how efficient the extension j was in the long run. After 
some obvious modifications taking into account that there is no history yet for 
the first k steps, and that no anticipation is useful for the last few steps, we use 
p(j\i) (which is properly normalized already!) as the probability with which we 
make step j, given the history i. 

Note that this can also be used, e.g., for stretched polymers where the p(j\i) 
are not isotropic, and where one can anticipate that the next monomer should 
be added preferentially in the direction of stretching. 

4.4 Error Estimates and Reliability Tests 

Errors can in principle be estimated a priori and a posteriori. In the former case 
one knows them even before making the simulations. For instances, if one draws 
n realizations of a random number with variance a, the average has variance 
a/N. A posteriori errors, in contrast, are obtained from fluctuations between 
the different realizations. 

A priori errors for go-with-the-winners simulations are possible jlTj but diffi- 
cult because the generated sample is correlated. Indeed, making such estimates 
was the main objective of fll7| , but the compromizes as regards efficiency are 
such that the results obtained there seem not very practical. 

A posteriori errors can be made easily by dividing a long run into several 
bunches, computing averages over each bunch, and studying the fluctuations 
between them. This is essentially also the strategy in standard Metropolis sim- 



ulations, but here the situation is even simpler. Since each 'tour' (see Sec.4.2) 
is independent from any other, the break-up into bunches just has to be be- 
tween tours. No problem due to correlations of uncertain range as in Metropolis 
simulations occurs here. 

Nevertheless, the problems of critical slowing down and of being trapped in 
local free energy minima which plague Metropolis simulations are not absent in 
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go-with-the-winners simulations. They just appear in new guises. Namely, single 
tours can become extremely large. If that happens, nearly the entire weight 
accumulated during a long simulation can be carried by a single tour or, what is 
even worse, the tours of really large weight have not been found at all. The latter 
is the analogon to not yet having reached equilibrium in Metropolis simulations. 

Although this is first of all a problem of error bars, it can easily, if one is 
not very careful, turn into a source of systematic errors. This is because one is 
primarily not interested in the partition sum (which is always sampled without 
a bias), but in its logarithm or in derivatives thereof. Consider e.g. a situation 
where we want to make several independent runs since we want to make sure that 
we have made everything right. From each simulation we estimate a free energy, 
and then we take their average value as our final estimate. If the problem is really 
hard, the fluctuations of the partition sums will be non-Gaussian, with very many 
small downward fluctuations compensated by few large positive fluctuations. By 
taking the logarithm, the latter are cut down, and a negative bias results. 

There is no fool-proof remedy against this danger. But there is an easy and 
straightforward way to check that at least that part of phase space which has 
been visited at all has been sampled sufficiently during a single run. For this, we 
make a histogram of tour weights on a logarithmic scale, P(log(u>)), and compare 
it with the weighted histogram u;P(log(u>)). If the latter has its maximum for 
values of log(u;) where the former is already large (i.e. where the sampling is 
already sufficient), we are presumably on the safe side. However, if wP(log(wj) 
has its maximum at or near the upper end of the sampled range, we should be 
skeptical. 

As an example, we show results for a self avoiding 2-d walk in a random 
medium p9| . This medium is an infinite square lattice with (frozen) random 
energies Ei on each site i. In particular, Ei is either —1 (with probability p) 
or (with probability 1 — p). The polymer is free to float in the entire lattice. 
Previous simulations p8j had suggested that for any finite p there is a phase 
transition at T = T c (p), maybe because the polymer becomes localized in an 
"optimal" part of the lattice for T <T C . 
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Fig. 9. Absolute value of U for p = 
1/4 and N = 200. The continuous 
line is the exact theoretical result. 
Pluses (+) have low statistics (ca. 
10 to 20 min CPU time per point); 
crosses (x) have medium statistics 
(about a factor 10 more); squares 
have about a factor 20 more statis- 
tics than crosses. 
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Fig. 10. Full lines are histograms of logarithms of tour weights for 1/kT — 0.92 (left 
panel) resp. 2.30 (right), normalized as tours per bin. Broken lines show the corre- 
sponding weighted distributions, normalized so as to have the same maximal heights. 
Weights W are only fixed up to a /3-dependent multiplicative constant. 



But one can show rigorously that no such transition can exist |39| |. In order 
to understand the source of the problem, we made simulation with PERM and 
monitored the distribution of tour weights. Results are shown in Figs. 9 and 
10. In Fig. 9 we show the average values of the energy^], \U\ = —{J2iec ^*) ^ or 
p = 1/4 and chain length TV = 200. Three curves obtained from simulations are 
shown together with a curve obtained analytically. The fact the three curves, ob- 
tained with vastly different statistics, agree with each other but deviate from the 
theoretical curve at the same value of T indicates the supposed phase transition. 
But histograms obtained in the regions where theory and simulations (dis-)agree 
(see Fig. 10) show clearly that the simulations for 1/kT > 2 are not reliable (right 
panel) while those for 1/kT < 1.5 are. 



5 Diffusion Quantum Monte Carlo 

For completeness we sketch here shortly the main idea of diffusion type quantum 
MC (QMC) simulations |j],[l0 11 , to show how they are related to the previous 
calculations and to understand why a perfect bias is in principle possible in QMC 
but not in classical applications. 

We start from the simplest version of a time dependent Schrodinger equation, 

idj)(x,t)/dt= -(2m) _1 VV(M) +V(x)ip(x,t), (15) 

and we are interested in finding its ground state energy, i.e. we want to solve the 
time-independent Schrodinger equation 

E mia iP(x) = -(2m)- 1 V 2 V(a;) + V{x)^{x), (16) 

1 Here, C denotes the set of sites occupied by the walk, and averaging is done over all 
walks and all disorder realizations 
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for the smallest eigengenvalue E m in- Replacing t by an imaginary "time" and 
(2m) _1 by a diffusion coefficient D we end up at a diffusion equation 

dip(x, t)/dt = D\7 2 ijj(x, t) - V(x)<ip(x, t) (17) 

with an external source/sink V(x) and ip viewed as a classical density. The 
ground state energy of the original problem is now transformed into the slowest 
relaxation rate. If we want to simulate this by diffusing particles, we can take 
the last term into account by either killing particles (if V > 0) and cloning them 
(if V < 0), or by giving them a weight exp[J dtV(x(t))}. Neither is very efficient. 
For efficiency, we should rather replace the random walk by a biased ( "guided" ) 
motion for which neither weighting nor killing/cloning is needed. 
For this purpose we choose a "guiding function" g(x) and write 

i/>(x,t)=p(x,t)/g(x). (18) 
leads then to the following equation for p: 

dp/dt = DV 2 p - [V(x) - D Y ^ 1 \P ~ V (W^] p) (19) 

This is now a diffusion equation with drift (last term) and with a modified 
source/sink term. If the latter is constant, i.e. 

DV 2 g(x) - V(x) g(x) = const g(x), (20) 

then no killing/pruning is needed and the weight increase/decrease is uniform 
and thus trivial. But Eq. ( p0| ) is just the time-independent Schrodinger equation, 
Eq. (|l^), we wanted solve. It seems that we have gained nothing. For an optimal 
implementation, we have to know already the solution we want to get. 

Things are of course not so bad since we can proceed iteratively: start with 
a rough guess for g{x), obtain with it an estimate for ip(x), use it as the next 
guess for g(x), etc. 

A crucial observation now is the following: the density p(x) of the guided 
diffusers is, if g(x) satisfies Eq. (pp|), just equal to the quantum mechanical 
density^, \ip(x)\ 2 . Thus random sampling of Eq. ([n]) corresponds precisely to 
random sampling of the quantum-mechanical density. We thus really have solved 
the problem of importance sampling. 

Note that if we had started instead with Eq. ( |l7| ) as a classical problem, and 
were interested in the classical density, we would not make perfect importance 
sampling: the particles would be sampled in the simulation not with the density 
they should have. Although using Eq. for the guiding function would still 
be formally correct, it would not lead to minimal statistical fluctuations. 

2 Note that ip(x) is real here since we are interested in the ground state. However, a 
similar derivation is possible when using a time-dependent guiding function g(x,t). 
Then Eq. (fed) becomes the adjoint time-dependent Schrodinger equation. 



20 



Peter Grassberger and Walter Nadler 



6 Conclusion 

We have seen that stochastic simulations not following the traditional Metropolis 
scheme can be very efficient. We have illustrated this with a wide range of prob- 
lems. Conspicuously, the Ising model was not among them. The reason is simply 
that no go-with-the-winners algorithm for the fsing model has been proposed 
which is more efficient than, say, the Swendsen-Wang algorithm. But there is 
no reason why such an algorithm should not exist. In principle, the go-with-the- 
winners strategy has at least as wide a range of applications as the Metropolis 
Metropolis scheme. Its only requirement is that instances (configurations, histo- 
ries, ...) are built up in small steps, and that the growth of their weights during 
the early steps of this build-up is not too misleading. 

The method is not new. It has its roots in algorithms which are regularly used 
since several decades. Some of them, like genetic algorithms, are familiar to most 
scientists, but it is in general not well appreciated that they can be made into 
a general purpose tool. And it seems even less appreciated how closely methods 
developed for quantum MC simulations, polymer simulations, and optimization 
methods are related. I firmly believe that this close relationship can be made 
use of in many more applications to come. 
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